Compare Page

Semantic consistency

Characteristic Name: Semantic consistency
Dimension: Consistency
Description: Data is semantically consistent
Granularity: Element
Implementation Type: Rule-based approach
Characteristic Type: Declarative

Verification Metric:

The number of semantically inconsistent data reported per thousand records

GuidelinesExamplesDefinitons

The implementation guidelines are guidelines to follow in regard to the characteristic. The scenarios are examples of the implementation

Guidelines: Scenario:
Ensure that semantics of data is consistent within/across applications (1) All orders placed by the customers are called “Sales order” in all tables/databases.
(2) Anti-example:
Payment type ( Check)
Payment Details (Card type,
Card number)
Maintenance of data dictionary or standard vocabularies of data semantics (1) Data dictionary provides technical data as well as semantics of data

Validation Metric:

How mature is the creation and implementation of the DQ rules to maintain semantic consistency

These are examples of how the characteristic might occur in a database.

Example: Source:
School admin: a student’s date of birth has the same value and format in the school register as that stored within the Student database. N. Askham, et al., “The Six Primary Dimensions for Data Quality Assessment: Defining Data Quality Dimensions”, DAMA UK Working Group, 2013.
A company has a color field that only records red, blue, and yellow. A new requirement makes them decide to break each of these colors down to multiple shadings and thus institute a scheme of recording up to 30 different colors, all of which are variations of red, blue, and yellow. None of the old records are updated to the new scheme, as only new records use it. This data- base will have inconsistency of representation of color that crosses a point in time. J. E. Olson, “Data Quality: The Accuracy Dimension”, Morgan Kaufmann Publishers, 9 January 2003.

The Definitions are examples of the characteristic that appear in the sources provided.

Definition: Source:
Data about an object or event in one data store is semantically Equivalent to data about the same object or event in another data store. ENGLISH, L. P. 2009. Information quality applied: Best practices for improving business information, processes and systems, Wiley Publishing.
Data is consistent if it doesn’t convey heterogeneity, neither in contents nor in form – anti examples: Order.Payment. Type = ‘Check’; Order. Payment. CreditCard_Nr = 4252… (inconsistency in contents); Order.requested_by: ‘European Central Bank’;Order.delivered_to: ‘ECB’ (inconsistency in form,because in the first case the customer is identified by the full name, while in the second case the customer’s acronym is used). KIMBALL, R. & CASERTA, J. 2004. The data warehouse ETL toolkit: practical techniques for extracting. Cleaning, Conforming, and Delivering, Digitized Format, originally published.
The extent of consistency in using the same values (vocabulary control) and elements to convey the same concepts and meanings in an information object. This also includes the extent of semantic consistency among the same or different components of the object. STVILIA, B., GASSER, L., TWIDALE, M. B. & SMITH, L. C. 2007. A framework for information quality assessment. Journal of the American Society for Information Science and Technology, 58, 1720-1733.

 

Continuity of data access

Characteristic Name: Continuity of data access
Dimension: Availability and Accessability
Description: The technology infrastructure should not prohibit the speed and continuity of access to the data for the users
Granularity: Information object
Implementation Type: Process-bases approacd
Characteristic Type: Usage

Verification Metric:

The number of tasks failed or under performed due to the lack of continuity in data access
The number of complaints received due to lack of continuity in data access

GuidelinesExamplesDefinitons

The implementation guidelines are guidelines to follow in regard to the characteristic. The scenarios are examples of the implementation

Guidelines: Scenario:
Convenient and efficient platform should be made available to access data depending on the task at hand (1) For a sales person, a web based interface run on a smart device is more suitable to quickly access data
Speed of the data retrieval should be acceptable for users working pace (1) For an online customer care executive, speedy retrieval of information is necessary since the customer cannot be kept waiting (2) With the growth of the database reports become slower (Anti example)
Continuous and unobstructed connectivity should be ensured for data retrievals (1) Connection lost while accessing reports (Anti example)
Proper concurrency control has been implemented (1) Controlling access to data by locks
Technological changes in the infrastructure/system should be handled in such a way that they should not make data inaccessible (1) New version of the software does not provide access to " X out orders" since the new version does not allow the function "X out"

Validation Metric:

How mature is the process of maintaining an infrastructure for data access

These are examples of how the characteristic might occur in a database.

Example: Source:
1) For example, recording the age and race in medical records may be appropriate.

However, it may be illegal to collect this information in human resources departments.

2) For example, the best and easiest method to obtain demographic information may be to obtain it from an existing system. Another method may be to assign data collection by the expertise of each team member. For example, the admission staff collects demographic data, the nursing staff collects symptoms, and the HIM staff assigns codes. Team members should be assigned accordingly.

B. Cassidy, et al., “Practice Brief: Data Quality Management Model” in Journal of AHIMA, 1998, 69(6).

The Definitions are examples of the characteristic that appear in the sources provided.

Definition: Source:
1) Is there a continuous and unobstructed way to get to the information?

2) Can the infrastructure match the user’s working pace?

EPPLER, M. J. 2006. Managing information quality: increasing the value of information in knowledge-intensive products and processes, Springer.
Data is easy and quick to retrieve. PRICE, R. J. & SHANKS, G. Empirical refinement of a semiotic information quality framework. System Sciences, 2005. HICSS'05. Proceedings of the 38th Annual Hawaii International Conference on, 2005. IEEE, 216a-216a.
1) availability of a data source or a system.

2) Accessibility expresses how much data are available or quickly retrievable.

3) The frequency of failures of a system, its fault tolerance.

SCANNAPIECO, M. & CATARCI, T. 2002. Data quality under a computer science perspective. Archivi & Computer, 2, 1-15.